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(54) Image retrieving method and apparatus 

(57) By sequentially inputting images for each 
frame, sequentially extracting features from the input 
frame images, converting the features sequentially 
extracted into a feature series corresponding to the 
input frame image series, compressing the feature 
series in the direction of the time axis, storing the com- 
pressed feature series in the storage, sequentially 
extracting features separately from the images to be 
retrieved for each input frame, sequentially comparing 
the features of the images to be retrieved for each frame 
with the stored compressed feature series, storing the 
progress state of. this comparison, updating the stored 
progress state of the comparison on the basis of a com- 
parison result with the frame features of the succeeding 
images to be retrieved, and retrieving image scenes 
matching with the updated progress state from the 
images to be retrieved on the basis of the comparison 
result between the updated progress state and the fea- 
tures of the inrmges to be retrieved for each frame, the 
present invention can retrieve video images on the air or 
video images in the data t>ase at high speed and ena- 
bles self organisation of video to be classified and 
arranged on the basis of the identity of partial images of 
video. 
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Description 

BACKGROUND OF THE INVENTION 

FIELD OF THE INVENTION 5 

The present invention relates to a retrieving method 
and apparatuses therefor for video images on the air or 
video images in a data base or others and more partic- 
ularly to a video image retrieving method and appara- io 
tuses therefor for performing high-speed retrieval by the 
help of features of video images. 

DESCRIPTION OF THE PRIOR ART 

15 

Recently, multi-media information processing sys- 
tems can store and indicate various types of information 
such as video and text to users. However, with respect 
to retrieval of them, a retrieving method using a lan- 
guage such as a keyword is mciinly used. In this case, a 20 
keyword assigning operation is necessary and it is 
extremely expensive to assign a keyword to each frame 
of video having a large amount of information. Further- 
more, since a keyword is freely assigned by a data base 
constructor, there is a problem imposed that when the 25 
viewpoint of a user is different from that of the data base 
constructor, the keyword will be useless. In these cir- 
cumstances, a request for retrieval from a unique image 
feature in addition to the keyword is made. However, to 
retrieve information on the basis of the feature of an 30 
image, a high-speed comparison art between the video 
feature comprising enormous frames and the feature for 
the queried image is necessary. As a high-speed com- 
parison art only applicable to video images. "Video 
retrieving method and apparatuses therefor" is pro- 35 
posed in Japanese Patent Application Laid-Open 7- 
1 14567. This method does not compare all the frames 
but compares only an image at the time of changing of 
cut of images so as to reduce the processing amount. 
By doing tNs. the high speed also suited to comparison 40 
of images on the air is realized. On the other hand, there 
is a problem imposed that a scene comprising only one 
cut or a scene in which the cut change timing varies with 
editing before or after cannot be compared satisfacto- 
rily. Furthermore, during retrieval, scenes other than the 45 
scene specified as a retrieval key are not searched in 
the same way as with other general data base systems, 
so that whenever scene retrieval becomes necessary, it 
is necessary to repeatedly compare a very large 
amount of video information from the beginning thereof si 
to the last. The scene comparison process includes a 
number of processes such as processes to be per- 
formed commonly even if the scene to be retrieved is 
different as well as the feature extraction and reading 
processes and repetitive execution of such a process is si 
of no use. 



SUMMARY OF THE INVENTION 

An object of the present invention is to solve the 
aforementioned problems and to provide an image 
retrieving method for comparing the feature of a target 
image to be retrieved and the feature of a sample image 
to be prepared for query at high speed without perform- 
ing a keyword assigning operation for image retrieval 
and for detecting the same segment with the frame 
accuracy. A target image on the air or in the data k>ase 
is applicable. 

Another object of the present invention is to provide 
a method tor detecting the same scene existing in the 
target image regardless of whether it is specified as a 
retrieval key beforehand in the same way at the same 
time with input of the target image. 

Still another object of the present invention is to 
provide a video camera for compeu'ing. when recording 
an image series inputted from moment to moment dur- 
ing picking up of images, those images with recorded 
images and recording them in association with matched 
images. 

To accomplish the above objects, the present inven- 
tion is a signal series retrieving method and appara- 
tuses therefor in an information processing system 
comprisng a time sequential signal input means, a time 
sequential signal process controller, and a storage, 
wherein the method and apparatuses sequentially input 
time sequential signals, sequentially extract features in 
each predetermined period of the inputted time sequen- 
tial signals, convert the features sequentially extracted 
into a feature series corresponding to the inputted pre- 
determined period series, compress the feature series 
in the direction of the lime axis, store the compressed 
feature series in the storage, sequentially extract fea- 
tures from the time sequential signals to be retrieved in 
each predetermined period of the inputted time sequen- 
tial signals, sequentially compare the features of the 
time sequential signals to be retrieved in each predeter- 
mined period with the stored conrpressed feature 
series, store the progress state of the comparison, and 
retrieve a signal series matohing with the progress state 
from the time sequerrtial signals to be retrieved on the 
basis of the comparison result between the stored 
progress state of the comparison and the features of the 
time sequential signals to be retrieved in each predeter- 
mined period. 

More concretely, the present invention divkdes a 
video image to be compared into the segment-wise so 
that the feature of each frame is set in the variation 
width within the specif ic range respectively, extracts one 
or a plurality of features in each segment, stores it or 
them in correspondence with the address information 
indicating the position in the image in the segment, then 
sequentially inputs frame images one by one from video 
images to be retrieved, and when the feature series at 
an optional point of time in which the features of the 
frame images are sequentially arranged and the feature 
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^ \ series in which the features in the segments constituting 
^ the stored images are sequentially arranged in each 
segment length have portions equal to or more than the 
specific length which can be decided to be mutually 
equivalent to each other, detects the portions as a same 
image. In this case, when they are equivalent to each 
other from the top of a segment, the present invention 
obtains the address information corresponding to the 
segment and when they are decided to be equivalent to 
each other from halfway of a segment, the present 
invention obtains the relative position from the top of the 
segment, and outputs a corrected value of the address 
information corresponding to the segment as a retrieval 
result. Furthermore, the present invention collects a 
frame image series inputted as a retrieval target in each 
segment so that the features of the frames are set in the 
variation width within the specific range, extracts one or 
a plurality of features in each segment, also stores the 
information corresponding to the address information 
indicating the position in the target image in the seg- 
ment, and adds it to the target images to be compared 
next. Furthermore, with respect to the inputted feature 
series, when there are a plurality of video portions 
which are detected to be the same, the present inven- 
tion groups them, associates them to each other, and 
stores them. 

An apparatus realizing the aforementioned retriev- 
ing method comprises a means for dividing an optional 
image into the segment-wise so that the feature of each 
frame is set in the variation width within the specific 
range respectively, a mears for extracting one or a plu- 
rality of features in each segment, a means for storing it 
or them in correspondence with the address information 
indicating the position in the image in the segment, a 
means for sequentially inputting frame images one by 
one from images to be retrieved, a means for retaining 
the feature series at an optional point of time in which 
the features of the frame images are sequentially 
arranged, a means for generating the feature series in 
which the features in the segments constituting the 
stored images are sequentially arranged in each seg- 
ment length, and a means for deciding whether the fea- 
ture series have portions equal to or more than the 
specific length which can be decided to be mutually 
equivalent to each otiier. The present invention also has 
a means for obtaining, when they are decided to be 
equivalent to each other from the top of a segment, the 
address information corresponding to the segment, 
when they are decided to be equivalent to each other 
from halfway of a segment, obtaining the relative posi- 
tion from the top of tiie segment, and outputting a cor- 
rected value of the address information corresponding 
to the segment as a retrieval result. Furthermore, tiie 
present invention has a means for collecting a frame 
image series inputted as a retrieval target in each seg- 
ment so that the features of tiie frames are set in the 
variation width within the specific range, a means for 
extracting one or a plurality of features in each segment, 



and a means for also storing the information conre- 
sponding to tiie address information indicating tiie posi- 
tion in the target image in the segment and adding it to 
tiie target images to be compared next. Furthermore, 

5 with respect to tiie inputted feature series, when tiiere 
are a plurality of scenes which are detected to be the 
same, tiie present invention has a means for grouping 
them, associating them to each other, and storing them. 
The foregoing and otiier objects, advantages, man- 

w ner of operation and novel features of the present inven- 
tion will be understood from the following detailed 
description when read in connection with the accompa- 
nying drawings. 

75 BRIEF DESCRIPTION OF THE DRAWINGS 

Rg. 1 is a block diagram of a system for executing 
an embodiment of the present invention. 

Rg. 2 is a block diagram of a process for executing 
20 an embodiment of the present invention. 

Rg. 3 is a schematic view-showing the feature 
extracting method of an embodiment of the present 
invention. 

Rg. 4 is a schematic view showing the feature com- 
25 paring method of an embodiment of the present inven- 
tion. 

Rg. 5 is a drawing showing an example of feature 
comparison flow of an embodiment of the present 
invention. 

30 Rg. 6 is a schematic view showing an example of 
the conventional connparing method. 

Rg. 7 is a schematic view for explaining the com- 
paring metiiod of an embodiment of the present inven- 
tion. 

35 Rg. 8 is a schematic view for explaining the com- 
paring method of an embodiment of the present inven- 
tion. 

Rg. 9 is a block diagram of a process for executing 
an emt)Odiment of the present invention. 
40 Rgs. 10A and 10B are flow charts of an embodi- 
ment of the present invention. 

Rg. 1 1 is a drawing showing the feature table sfruc- 
ture used in an embodiment of tiie present invention. 
Rg. 12 is a drawing showing the candidate list 
45 structure used in an embodiment of the present inven- 
tion. 

Rg. 13 is a drawing showing the candidate struc- 
ture used in an embodiment of the present invention. 

Rg. 14 is a drawing showing the retrieval result 
so table and retrieval segment structure used in an embod- 
iment of the present invention. 

Rg. 15 is a schematic view of a video recorder sys- 
tem applying an embodiment of the present invention. 

Rg. 16 is a drawing showing a display screen 
55 example during image retrieval of self organization of 
video by the present invention. 

Rg. 17 is a drawing showing a. display screen 
example during image retrieval of self organization of 
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video by the present invention. 

Fig. 18 is a drawing showing a display screen 
example during image retrieval of self organization of 
video by the present invention. 

Rg. 19 is a schematic block diagram when the s 
present invention is applied to a video camera. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

10 

An embodiment of the present invention will be 
explained hereunder by referring to the drawings. 

Fig. 1 is an example of a schematic t)lock diagram 
of the system configuration for realizing the present 
invention. 

Numeral 1 indicates a display such as a CRT. which 
displays an output screen of a computer 2. When the 
output of the computer is voice, the computer 2 outputs 
it via a speaker 13. An instruction to the computer 2 can 
be issued using a pointing device 3 and a keyboard 4. A so 
video reproducing apparatus 5 is an optical disk or a 
video deck. A video signal outputted from the video 
reproducing apparatus 5 is sequentially converted to 
digital image data by a video input device 6 and sent to 
the computer. In certain circumstances^ an image on 25 
the air can be fetched and a video signal from a broad- 
cast receiver 7 is inputted to the video input device 6. 
When a video server recording an image as digital data 
or digital video is used instead of the video reproducing 
apparatus 5, the video input device 6 is unnecessary or 30 
a function for expanding compressed and recorded 
image data and converting it to incompressed image 
data is controlled. If the broadcast is of a digital system, 
the same may be said with the broadcast receiver 7. 
Inside the computer, digital image data is inputted to a 35 
memory 9 via an interface 8 and processed by a CPU 
10 according to a program stored in the memory 9. 
When video handled by the CPU 10 is sent from the 
video reproducing apparatus 5. a number (frame No.) is 
sequentially assigned to each frame image starting 4C 
from the top of videa When a frame number is sent to 
the video reproducing apparatus by a control line 11, 
the apparatus can control so as to reproduce the video 
of the scene. When video is sent from the broadcast 
receiver 7, no frame number is assigned, so that the 4i 
apparatus records a sequence number or time starting 
from a process start time of 0 as required and uses it 
instead of the frame number. Various informations can 
be stored in an external information storage 12 as 
required by the internal process of the computer. Vari- 5( 
ous data created by tiie process which will be explained 
hereunder is stored in the memory 9 and referred to as 
required. 

Rg. 2 is a whole block diagram showing tiie proc- 
ess outline of the image retrieval process of the present 5 
invention. This process is executed inside the computer 
2. The process program is stored in the memory 9 and 
executed by the CPU 1 0. Hereunder, the process will be 



explained on tiie assumption that each unit is described 
as a software procedure to be executed by the CPU 10. 
However, needless to say, a function equivalent to this 
procedure can be realized by hardware. In the following 
explanation, tiie processes performed by tiie software 
are blocked for convenience. Therefore, for example, in 
Fig. 2, the input unit for queried image indicates an Input 
process for queried image. In tiiis embodiment, an 
image of the scene to be found out (hereinafter, called a 
0 queried image) 100 is sequentially inputted for each 
frame by an input unit for queried image 1 02 beforehand 
prior to retrieval and temporarily stored in the memory 9. 
A frame feature extractor 1 06 extracts a feature 8 from a 
frame inmge 104 in the memory 9. A feature table gen- 
15 erator 110 pairs up tiie feature and the top frame 
number for each segment off a string of features when 
the feature is within tiie allowable variation range, cre- 
ates a feature table 112, and records it in a storage 114. 
Also an image 1 16 to be retrieved is sequentially input- 
20 ted for each frame by an input unit for target image to be 
compared 1 18 in tine same way as with a queried image 
and temporarily stored in tiie memory 9. A frame feature 
extractor 122 extracts a feature 124 from a frame image 
120 in tiie memory 9. In tiiis case, the frame feature 
25 extractor 1 22 performs the exactly same process as that 
of the frame feature extractor 106. A feature comparator 
130 compares the newest time sequential array of the 
features 124 sequentially sent from the frame feature 
extractor 122 witii a stored feature table 300 (the data 
30 content is tiie same as ttiat of tiie feature table 1 1 2) for 
consistency The progress state off the comparison is 
stored in the storage 126 in the form of a candidates list 
400 which will be described later and ipdated every 
input of a new frame. If the features are consistent with 
35 each other, the image segment corresponding to the 
feature table is outputted to a storage 128 or the other 
processor as a retrieved result table 600 which will t»e 
described later. If any name arxj attribute are associ- 
ated wHh tiie retrieved image in this case, it is naturally 
40 possible to output the name and attribute. 

Next, ttie process performed by each unit men- 
tioned above will be explained more in detail. 

Rg. 3 shows a series off fflow (100 to 1 14) from input 
of a queried image to creation of a feature table. The 
45 object of this process is to compress queried images to 
a minimum quantity of information which can represent 
the features thereof so as to store more types of queried 
images and compare them in real time at one time. Con- 
cretely, features are extracted from frame images 
so sequentially inputted first. In this case, the feature is 
explained as information which can be represented by 
several bytes such as the mean color of the viriiole frame 
images. As a feature, in addition to it, patterns generally 
known such as tiie shape of tiie boundary line and tex- 
55 ture of a specific image can be widely applied. Further- 
more, the time sequential anay of obtained features is 
collected for each segment within the allowable varia- 
tion range and one feature is represented in each seg- 
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> ment. A* or A" shown in the drawing indicates that 
assuming A as a standard, the absolute value of the dif- 
ference of the feature value of A' or A" from that of A Is 
less than a specific threshold value. To each frame of 
inputted images, frame numbers are sequentially 
assigned such as ti. Xz, ts. — . and the frame numbers t}. 
tj, tk, — of the top frame of each segment and the fea- 
tures A. B. Cp — are paired up. and a list is generated as 
a feature table. In this case, video comprises 30 frame 
images per second, so that although depending on the 
kind of an image to be searched for, assuming that the 
mean segment length is 10 frames, a permutation pat- 
tern comprising 1 0 or more features can be obtained 
even from a scene in only several seconds. Further- 
more, if the length of each segment is added to the 
restrictions, the number of permutations arxJ combina- 
tions of feature tables becomes extremely large in this 
case and a performance for sufficiently specifying one 
scene even in many images can be expected. 

Fig. 4 schematically shows the situation of compar- 
ison (the feature comparison process 130) between the 
video image to be retrieved and the queried image 
stored beforehand. As mentioned above, with respect to 
target images to be retrieved, frame image are sequen- 
tially inputted and features are extracted (116 to 124). 
On the other hand, with the queried images com- 
pressed in the form of feature table, the features are 
arranged in the length of each segment and the feature 
series is returned from the run-wise to the frame-wide 
during conrparison (130). In the case of comparison, a 
queried image having a feature series matching with the 
feature series in a length more than the specific thresh- 
old value which has the newest frame just inputted from 
the target image as a last end is returned as a retrieved 
result In this case, not only a complete match but also 
a partial match of the feature series are detected and 
when the length of the matched part is more than the 
threshold value. It is also returned as a retrieved result 
By doing this, also a scene In which the length is deli- 
cately different due to editing can be correctly retrieved. 

Fig. 5 shows the comparison process of the present 
invention more in detail. If. when a feature series in an 
irxJefinite length as mentioned above is compared, the 
comparison is simply e)^uted, it is necessary to repeat 
a comparison on the assumption of veirious frame 
lengths as shown in Fig. 6 whenever a frame image is 
newly inputted from the target image. The number of 
inter-frame comparisons in this case Is extremely enor- 
mous as shown in the drawing and the comparison 
process is not suited especially to comparison in real 
time such that new frames are inputted one after 
another at a rate of once per 1/30 seconds. The reason 
is that the comparison process is executed quite inde- 
pendently of the previous comparison process every 
input of a frame and even if a match of a certain degree 
of length is ascertained by the just prior process, the 
information cannot be applied to the next comparison 
process. Therefore, the present invention takes an 



approach to reduce the comparison process to be per- 
formed for one frame input and to stepwise perform the 
comparison process so as to supplement the previous 
process every frame input. Concretely, the comparison 
5 is executed as indicated below. 

(1) When a frame is inputted from the target image, 
it is searched whether there is a feature which is the 
same as that of the frame in the queried image and 

10 all found frames are temporarily stored as candi- 
dates. 

(2) When the next frame is inputted from the target 
image, it is checked whether the feature of the 
frame matches with the feature of the frame imme- 

15 diately after the frame stored as a candidate imme- 
diately before. 

(3) When they match with each other, the frame is 
set as a candidate together with the frame stored as 
a candidate immediately before and when they do 

20 not match with each other, the frame is excluded 
from a candidate and a frame having the same fea- 
ture as that of the just irputted frame is newly 
added as a candidate. In this case, if the frame 
excluded from a candidate is kept consistent for the 

25 length (the nunt>er of frames) more than the spe- 
cif tc threshold value till that tinrte, the matched seg- 
ment with the frame set at the top is outputted as a 
retrieved result 

(4) The aforementioned operations are repeated. 

30 

The comparison principle of the present invention 
will be concretely explained hereunder by refening to 
the example shown in Fig. 5. 

Firstly, a hew frame is inputted from the target 

35 image and the frame (1) in which the feature X is 
obtained will be considered. Since there is not the fea- 
ture X in the queried image, nothing is performed. The 
same ntay be said with the frame (2). When the frame 
(3) is inputted and the feature A* is obtained, there e the 

40 feature A matching with A* in the queried image, so that 
all the frames <3) to ® having the feature A in the que- 
ried image are set as candidates. Depending on the 
appearing condition of features of frames to be inputted 
hereafter from the target image, any of these candidate 

45 frames has a possibility that one segment with the 
frame set at the top becomes a scene to be retrieved. In 
the lower table shown in Fig. 5, (D to (S) written on the 
line of Frame (3) indicate frames in the queried image 
which are selected as candidates at this point of time. 

50 Also in the next frame (4), the feature A* is obtained. 
Firstly, all the frames selected as candidates at the pre- 
ceding step are checked whether the next frames match 
in feature. As a result, the frames 0 to <S) match in fea- 
ture t)ut the frame <3) does not match in feature 

55 because the feature of the next frame (S> is changed to 
B. The portion of x marked on the fourth line in the takDie 
indicates it and the frame ® selected as a candidate in 
the frame (3) is excluded from a candidate at this point 
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of time. At the same time, as candidates in the frame 
(4), 0 to <3) which are the same as those of (3) are 
newly added on the fourth line in the table. Although the 
frames ® to ® added on the line (3) are the same as 
the frames ® to ® added on the line (4). they are han- 
dled as different candidates as comparison candidates. 
Furthermore. B is obtained in the frame (5) and ® and 
<2> selected as candidates in (3) and 0 to <S) selected 
as candidates in (4) are excluded from candidates. In 
the same way, (§) and ® are selected as candidates at 
this point of time. When the aforementioned process is 
repeated whenever a frame is inputted from the target 
image, candidates matching continuously up to the step 
of the frame (8) are only ® selected as a candidate in 
(3). ® selected as a candidate in (4), ® selected as a 
candidate in (5), © selected as a candidate in (6). and 
(2) selected as a candidate in (7). At the point of time 
that the frame (9) is inputted and no comparison can be 
made, it is found that the frames (3) to (8) of the target 
image and the queried images @ to ® have a longest 
matching segment. These results match with the com- 
parison results when the comparison of scenes Is 
checked by sequentially changing the length with the 
frame (8) as starting point using the conventional 
method previously shown in Fig. 8. In the case of Fig. 6, 
assuming the number of frames of queried images as n, 
the repetition time of comparison fc>etween the frames to 
be executed every one frame input is n(n+1)(n+2)/6 
times as shown in Fig. 6 and the order of the calculated 
value is 0{n^). However, according to this method, only 
the sum of (1) the repetition time c of checking for a 
match of the feature of a newly inputted frame with the 
feature of the next frame to the candidate frame and (2) 
the repetition time n of checking whether tiiere is the 
same feature as that of the newly irrputted frame in the 
queried images is acceptable, and generally n»c, and 
tiie order is 0(n). This difference is cased by use of the 
inductive method for obtaining the result of adding tiie 
cunrent frame on the basis of the processing result up to 
the just prior frame, n can be made smaller than the 
original number of frames by use of the aforementioned 
feature table and a quicker conrparison can be 
expected. Furtiiermore. tiie retrieved result can be 
clearly positioned with tiie frame accuracy. 

In the above expleuiation, a case of one queried 
image is assumed. However, tiie principle can be also 
applied to a plurality of queried images without trouble. 
For comparison every frame input, it is desirable only to 
repeat the aforementioned process for the number of 
queried images. However, as shown in Fig. 7, although 
the same image part is included in each of the queried 
images, they may be delicately different in the longitudi- 
nal direction due to an effect of a different editing way In 
the drawing, tiiree kinds of ways 0. ®. and @ are 
shown. The same may be said witii a case that a plural- 
ity of same image parts are included in one queried 
image. When only whether there is a matched part in 
the queried image is necessary, no problem is imposed. 



However, depending on tiie deject of retrieval, also tiie 
classification may t>e required on the basis of the accu- 
rate position and length of the matched segment. In this 
case. It is necessary to clearly output what segment 
5 matches witii what segment as a retrieved result. When 
tiiere Is an overlapped part as shown in No. 2 and No. 3 
in tiie drawing, it is necessary to indicate the overlapped 
part in consideration of the inclusion relationship. The 
method of the present invention can process also this 
70 problem at high speed without changing the k)asic com- 
parison principle. In the connparison process of tiiis 
metiiod, it is described that when a frame is inputted 
from tiie target image and the feature thereof is 
obtained, a group of frames having the same feature as 
IS tiiat of tiie target image is selected as candidates from 
the queried images. In this case, a group of matched 
segments with the frames selected as candidates at the 
same time set at the top which reach a length more than 
the detected threshold value is images which are equal 
20 to each other. In the exanple shouvn in Fig. 7. the seg- 
ment © exists in each of tiie three queried images and 
all the top frames of the segments of tiie queried images 
are selected as candidates at the same time when tiie 
frame con-esponding to the top of the segment ® is 
25 inputted from the target image. Although tiiere is tiie 
possibility that there are other frames to be selected as 
candidates at the same time, tiiey are excluded from 
candidates before tiiey reach a length more than the 
detected tiireshokJ value. They reach tiie end of tiie 
30 segment ® and when tiie next frame is compared, tiie 
matched segment in tiie queried images of No.1 and 
No. 3 is excluded from a candidate. The target image 
still continues the match with No. 2, However, the seg- 
ment ® is decided for the present and it is outputted as 
35 a retrieved result that ® is detected in the queried 
images No. 1 to No. 3. However, even if tiie segment ® 
ends, the queried image No. 2 continuously remains as 
a candidate t^cause also the next frame is still matched 
with the target image and finally tiie segment ® is 
40 decided. Even if there is a segment on this side of ® 
like 0. tiie matched segment is detected and dedded 
in the same way. As mentioned above, according to the 
mettiod of tiie present invention, only by performing a 
brief check when a segment is selected as a candidate 
45 or excluded from a candidate, scenes off various varia- 
tions delicately different in the longitudinal direction can 
be discriminated and detected respectively witti tiie 
comparison processing amount every frame input kept 
small. 

so In the above explanation, a case that queried 
images are prepared beforehand and then the target 
image is reti-ieved is used. However, this method can be 
applied even if the queried images are just target 
images. Fig. 8 shows a conceptual diagram thereof. Tar- 

55 get images are inputted, and all of them are stored, and 
they are handled as if they are the aforementioned que- 
ried images. It can be realized by the block diagram 
shown in Fig. 9. Although it is almost similar to tiie block 
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f w diagram shown in Fig. 2, the queried images are the 
^ • same as the target images, so that the process up to 
extraction of frame features can be shared and the 
frame feature 108 is distributed for storage and compar- 
ison. By this mechanism, the part of target images 
inputted past where the newest image part (3) inputted 
from the target images appears can be detected at the 
same time with input. If scenes appear several times 
past, all of them are detected at the same time on the 
aforementioned comparison principle, so that they are 
collected, classified, and arranged for each detected 
same scene. So to speaK self organization of video is 
automatically realized in real time. For example, if the 
present invention is applied to an apparatus for record- 
ing TV programs for several weeks to which a memory 
capacity for storing all TV programs for several weeks is 
installed, the same image is generally outputted every 
time at the opening of a program, so that by detecting 
the image and collecting the images before and after it. 
the programs can be arranged in real time at the same 
time with recording. If it is found that there are a plural'ity 
of same scenes, it is possible to leave only one image 
and erase the residual images by leaving only pointers, 
so that the use efficiency of media for recording can be 
improved. Although also a commercial message is one 
of images outputted repeatedly, to play fc>ack a recoided 
program, the commercial message can be automati- 
cally skipped as required. In this case, by use of the 
commercial characteristic that the length is just 15 sec- 
onds or 30 seconds, the decision performance as to 
whether it is a commercial message is improved. 

In the above explanation, the process of realizing 
the block diagram shown in Fig. 9 can be represented 
more concretely by the flow charts shown in Rgs. 10A 
and 10B. Also the process of realizing the block diagram 
shown in Fig. 2 is self-evident from Figs. IDA and 10B. 
In the at)ove explanation, for simplicity, the feature of the 
queried image Is returned from the run-wise to the 
frame-wise once and then compared. However, to make 
the specification closer to the practical use, a method of 
comparison in the run-wise state will be indicated here- 
under. 

Rrsdy. at Step 200, the apparatus and various vari- 
ables are initialized. The variables mc and mm are set to 
0. Next, a frame image is inputted from the target image 
(Step 202) and the feature F is extracted from the frame 
image (Step 204). The feature F uses the mean of 
colors of all pixels existing in the frame image. The color 
of each pixel is represented by the three components R. 
G, and B. and with respect to the value of each compo- 
nent, the values on the whole screen are averaged 
respectively, and a set of three values (Ra. Ga. Ba) is 
obtained, and this set is assumed as the feature F. If a 
first frame is inputted, a feature table structure 300 
shown in Fig. 1 1 is newly generated and F is written into 
302 as a feature of the first segment (segment No. 1). In 
this case, the frame number is also written into 304 as a 
pair. The feature table generated like this will function 



hereafter for the already mentioned queried image. In 
this case, the variable mc indicating the maximum value 
of the segments stored in the feature table structure 300 
is incremented by one and the program is returned to 

5 Step 202 as it is. On the other hand, if the second frame 
or a subsequent frame is inputted, Step 206 is exe- 
cuted. At Step 206. the feature FC of the newest seg- 
ment (the segment of the segment number mc-1) stored 
in the feature table and the cunent feature F are com- 

70 pared and it is decided whether the difference is smaller 
than the threshold value CTH. In this case, although the 
feature is a set of three values as mentioned above, only 
when the differences between the three values are all 
smaller than the threshold value CTH, it is represented 

75 that the difference is smaller than the threshold value 
CTH. If the difference is smaller than the threshold value 
CTH, it is decided that the frame currently inputted can 
be collected in the same segment as that of the just 
prior frames and the program goes to Step 208. At Step 

2o 208. the loop counter i is reset to 0. i is incremented by 
1 every time at Step 226 and Steps 210 to 224 are 
repeated until i becomes larger than mm. In this case, 
mm indicates the number of candidates at the stage of 
continuous inspection among all images (stored as the 

25 feature table 300) inputted until now on the assumption 
that there is the possibility that the part is the same as 
an image being newly inputted at present. A structure 
500 for storing the status variable indicating the inspec- 
tion stage of each of all candidates is generated and 

^0 managed by a candidate list structure 400 as shown in 
Rg. 12. Pointers to the candidate structure 500 are 
stored in the candidate list structure 400 and dynami- 
cally added or deleted during execution. Fig. 13 shows 
the constitution of the candidate structure 500 and the 

35 segment number when It is registered as a cardidate is 
stored as a starting segment number of comparison 502 
arxj tiie segment numl>er which starts from the segment 
BStX is a target of comparison at present is stored as a 
target segment number of conparison 504. A matching 

40 frame number counter 506 indicates the repetition time 
of matching since selected as a candidate, that is, the 
matching segment length. A starting frame offset for 
comparison 508 is a variable necessary for positioning 
with the frame accuracy by performing comparison in 

45 run-wise, which will be described later. Pointers to start- 
ing candidates of simultaneous comparison 510 con- 
nect a group of candidates simultaneously registered to 
each other in the connection list format and candidates 
simultaneously registered can be sequentially traced by 

so referring to 510. At Step 210, the program checks 
whether the comparison of the candidate i (indicated as 
a means of the i-th candidate among the mm candi- 
dates) is conpleted to the end of the segment which is 
a comparison target at present. When the frame 

55 number obtained by adding the matching frame number 
counter 506 to the frame number of the segment irxli- 
cated by the starting segment number of comparison 
502 reaches tiie frame number of the segment next to 



7 



SDOCID: <EP 08787e7A1 .l.> 



13 



EP0878 767A1 



14 



the segment which is a comparison target at present, it 
is Ibund that the comparison reaches the end. If it does 
not, the program increments the matching frame 
number counter of the candidate i by one (Step 216) 
and goes to Step 226. If rt does, the program refers to s 
the feature of the segment following the segment which 
is a comparison target at present and checks whether 
the difference between the feature and F is smaller than 
the threshold value STH (Step 212). If the difference is 
smaller than the threshold value STH, the program io 
changes the segment to be compared to the next seg- 
ment and continues the comparison (Step 214). By 
doing this, even if the segment changing location is dif- 
ferent from the input image, it can be stably compared. 
This is a necessary process because, since a video sig- 15 
nal may be changed due to noise during image input 
and characteristics of the apparatus, the changing point 
of the segment is not always the same even if the same 
image is Inputted. The reason for use of the threshold 
value STH which is different from the threshold value 20 
CTH deciding the segment change timing is that the 
change of an image is absoibed in the same way and a 
stable comparison is executed. On the other hand, at 
Step 212. when the difference is larger than the thresh- 
old value STH. the program checks whether the differ- 25 
ence between the feature of the segment which is a 
comparison target at present and the current feature F 
is smaller than the threshold value STH (Step 218). If 
the difference is smaller than the threshold value STH, 
the program goes to Step 226 without doing anything. 30 
The reason is that since a segment is selected as a can- 
didate not in frame-wise but in segment-wise and the 
features do not always match with each other starting 
from the top of the segment, while an input image hav- 
ing the same feature as that of the segment which is a 35 
comparison target at present is obtained, the program 
only waits by positioning for the present. If the difference 
is larger than the threshold value STH, it Is regarded 
that the features do not match with each other any 
more. If the value of the matching frame number counter 40 
of the candidate i is larger than the threshold value FTH 
in this case (Step 220), the program outputs the candi- 
date i as a retrieved scene (Step 222). The program 
deletes the candidate i from the candidate list (Step 
224) and goes to Step 226. 45 

At Step 206. if the difference is larger than the 
threshold value CTW, it is decided that the currently 
inputted frame cannot be collected in the same segment 
as that of the previots frames and anew segment is 
added to the feature table 300 (Step 228). In this case, so 
mc is incremented by one and F is substituted for FC. At 
Step 230, the loop counter i is reset to 0. i is incre- 
mented by one every time at Step 248 and Steps 232 to 
246 are repeated until i becomes larger than mm. At 
Step 232. the program checks whether the comparison 55 
of the candidate i is completed to the end of the seg- 
ment which is a comparison target at present. This can 
be obtained by the same method as that of Step 210. If 



the comparison reaches the end, the program changes 
the segment to be compared to the next segment (Step 
234) and if it does not, the program does nothing. Next, 
the program checks whether the difference between the 
feature of the segment which is a comparison target at 
present and the newest feature F is smaller than the 
threshold value STH (Step 236). If the difference is 
smaller than the threshold value STH. the program 
increments the matching frame number counter of the 
candidate i liy one (Step 238) and goes to Step 248. If 
the difference is larger than the threshold value STH. 
the program checks not only one segment immediately 
after the segment which is a comparison target at 
present but also the following segments sequentially 
and checks whether there is a segment having the 
same feature as the cunent feature F (Step 240). If 
there is, the program changes the next segment to a 
segment to be compared, substitutes the difference 
between the frame number of the segment and the 
frame number which is attempted to conpare at f irst for 
the starting frame offset for comparison 508. and goes 
to Step 248. Also the frame numbers do not always 
match with each other starting from the top of the seg- 
ment, so that the positioning with the frame accuracy 
can be executed by use of this offset. In this case, if the 
size of the offset is larger than the segment length when 
it is selected as a candidate, the program goes to Step 
242 by the same handling as that when no matching fol- 
lowing segment is found. If it is not, it is equivalent to the 
comparison started from a segment behind the seg- 
ment selected as a candidate f irst and in this case, it is 
expected that in the comparison started from the rear 
segment, a match is smoothly continued and the 
processing is duplicated. If, when no matching following 
segment is found, the value of the matching frame 
number counter of the candidate i is larger than the 
threshold value FTH (Step 242). the program outputs 
the candidate i as a retrieved scene (Step 244). The 
program deletes the candidate i from the candidate list 
(Step 246) and goes to Step 248. When the prck^ess for 
all the candidates ends, the program searches all seg- 
ments having the same feature as that of the currently 
inputted frame image from the segments stored in the 
feature table, generates a candidate structure having 
these segments as comparison starting segments, and 
adds it to the candidate list (Steps 250 to 256). 

At Steps 222 and 244 among the aforementioned 
steps, the program not only outputs the information of a 
found scene as it is but also can output it in the formats 
shown in Rg. 14. The retrieved result table 600 collects 
and groups found scenes for each same scene and 
manages the entry of each group. A group of same 
scenes is obtained as previously explained in Fig. 7. 
Each of found scenes Is represented by a retrieved seg- 
ment structure 700 and the same scenes represent one 
group in the connection list format that the scenes have 
mutually pointers. Pointers to same scenes forming a 
connection list are stored in 704 and the top frame 
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number of each segment is stored in 702. A pointer to 
the retrieval segment structure which is the top of the 
connection list representing a group is stored in 602 as 
an entry of the group. In the same group, the segment 
lengths of all scenes in the group are the same, so that 
they are paired up with the entry and stored In 604. 

When the aforementioned processes are repeated, 
a scene which appeared once in the past is detected the 
moment it appears once again and the top and length of 
the segment are positioned with the frame accuracy. 
The top of the segment is a frame in which the starting 
frame offset for comparison of the candidate structure is 
added to the frame number of the segment indicated by 
the starting segment number of comparison of the can- 
didate structure and the length is the value of the match- 
ing frame nunt>er counter itself. Hereafter, by collecting 
each same segment, automatic self organization can be 
realized. However, in the case of a scene that a still 
image continues for a long time, a prot)lem also arises 
that by this method reducing the feature of each frame, 
the characteristic time change of the feature cannot be 
obtained and the probability of matching with another 
still image scene by mistake increases. If this occurs, 
needless to say. it can be solved by increasing the fea- 
ture for each frame image. Also in the case of a scene 
that the feature changes little, even if a shift of several 
frames occurs, the features can match with each other. 
In such a case, a plurality of segments are overlapped 
and detected in the same range. As a typical example of 
it. there is a case that an image just inputted matches 
with a segment a little before in the same cut (one of the 
units constituting an image, a collected-image segment 
continuously photographed by a camera). The reason is 
that the frames in the same cut are well similar to each 
other on an image basis due to the redundancy of 
images. If this occurs, by introducing the known detec- 
tion method for the cut change timing and performing a 
process of not regarding as a match in the same cut, the 
problem can avoided. 

Fig. 15 is a conceptual diagram showing an embod- 
iment of a next generation video recorder system using 
the present invention, particularly the method shown In 
Fig. a. The system records video of a TV program and 
also executes the function of the present invention at 
the same time. Address information such as a frame 
number is assigned to each frame of video to be 
recorded, and the address information is used as the 
frame number 304 of the feature table 300 wNch is gen- 
erated by the present invention, and a one-to-one syn- 
chronization is established between the video data and 
the feature table. When the recording ends, the feature 
table and various variables used in the present invention 
are stored in a nonvolatile storage so as to be read and 
restarted when the next recording starts. By doing this, 
it is possible to newly input images, compare them with 
the images already stored in the video archive in real 
time at the same time, and automatically associate the 
same scenes with each other. For example, if a program 



for comparing the inputted images and the theme song 
portion is already stored, they are sequential programs 
and can be automatically collected and arranged as a 
same classification. If. when sequential programs are 
5 watched for the first time, information is assigned as a 
common attribute of the whole sequential programs, it is 
possible to allow an image just inputted to immediately 
share the information. As mentioned previously, also a 
commercial message appearing repeatedly can be 
10 detected and skipped. However, only based on a com- 
mercial message existing in an image recorded and 
stored, only a limited number of commercial messages 
can be detected. Therefore, even when no images are 
recorded, images are checked for 24 hours, and a com- 
15 merclal portion is detected from a repetitive scene, and 
with respect to the images of the commercial portion, 
although the images are not recorded, only a feature 
table is generated and recorded. By doing this, more 
commercial messages can be detected with the image 
20 capacity kept unchanged and a commercial message 
can be skipped more securely. As mentioned above, 
when the present invention is mounted in the next gen- 
eration video recorder system, automatic arrangement 
of a recorded program and automatic skipping of a- 
25 commercial message can be simply executed and the 
usability is extremely improved. In the aforementioned 
embodiment, it is emphasized that broadcasting images 
can be set as an object. However, needless to say. even 
images stored in a file may be set as an object. 
30 Rg. 16 shows an embodiment of a display screen 
used for interaction wKh a user. A film image of video is 
played back and displayed on a monitor window 50 on 
the display of the computer. As a window displayed on 
the same screen, there are a vnndow 52 for displaying a 
35 list of typical frame images among irriages, a text win- 
dow 55 for inputting attributes of images and scenes, 
and a window 54 for displaying retrieved results in addi- 
tion to the window 50. Retrieved results may be dis- 
played on the window 52. These windows can be moved 
40 to an optional position on the screen by operating a cur- 
sor 53 which can be freely moved by the mouse which 
is one of the pointing device 3. To input text, the key- 
tx>ard 4 is used. A typical frame displayed on the win- 
dow 52 is, for example, the top frame of each cut when 
45 an image is divided in cut-wise. Buttons 51 are buttons 
for controlling the playtjack status of an image and when 
the buttons are clicked by the mouse, playt>acK fast 
feed, or rewinding of images can be controlled. Scenes 
to be played back can be continuously selected by click- 
so ing the typical frame images displayed as a list on the 
window 52. In this case, as video to be played back, 
images outputted by the video reproducing apparatus 5 
connected to the computer may be used or digitized 
innages registered in an external information storage 
55 may be used. When the video reproducing apparatus 5 
is used, the frame number at the top of a scene is sent 
to the video reproducing apparatus and the playt>ack is 
started from the scene corresponding to the frame 
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number. When the playback reaches the frame nuntber 
at the end of the scene, an instruction for suspending 
the playback is sent to the video reproducing apparatus 
5. The same may be basically said with a digitized 
image, though digital video data is read and then it is 
converted to drawing data for a computer and displayed 
as a kind o1 graphic. When the display process for one 
frame ends, the display process of the nexl frame is 
continuously executed and by doing this, moving picture 
images are displayed. In accordance with the time 
required for the display process, the number of frame 
images to be displayed for a fixed time is adjusted so as 
to prevent images from rather fast feed or rather slow 
feed. On the monitor window 50. images from the 
broadcast receiver 7 can be also displayed. 

The operation procedure for video retrieval by a 
user using the screen shown in Rg. 16 will be described 
hereunder. Firstly, he specifies an image to be queried. 
The simplest method is a method for executing fast feed 
or rewinding using the operation buttons 51 and finding 
an optional scene by checking images displayed on the 
monitor window 50. The list of typical frames anranged 
on the window 52 is equivalent to the contents or 
indexes of a book and by refenring to it, he can find a 
desired scene more quickly To specify a scene, there is 
no need to accurately specify the range of the scene 
and it is desirable to specify an optional frame included 
in the scene. In this case, it may be specified by clicking 
the frame displayed on the monitor window 50 by the 
mouse. If a frame image included in the image to be 
queried is displayed in the list of typical frames on the 
window 52. it may be clicked by the mouse. Next, on the 
text window 55, the user inputs and registers attribute 
information such as the selected scene, title of the 
whole image, and person's name from the keylx>ard. 
The repetition time of registration is optional and if there 
is no need to reuse the attribute information hereafter, 
there is no need to register the attribute information at 
all. Finally, the user presents a retrieval start request. It 
can be done by clicking the OK button of the text window 
55. By doing this, the system starts the retrieval proc- 
ess. The system imaginarlly generates a segment with 
a fixed length having the specified frame just in the mid- 
dle thereof and applies the segment to the retrieval 
method of the present invention as an image to be que- 
ried. The target image may be newly inputted from the 
vkieo reproducing apparatus. If it is an image which is 
already registered as a data base and whose feature 
table is generated, the comparison process is per- 
formed for the feature table, in this case, if the frame 
specified first is included in the segment of the obtained 
retrieved result, it is the retrieved result. Furthernnore, it 
is checked whether it is a partial match or a match of the 
whole segment. In the case of a match of the whole seg- 
ment, it is possible to spread the segment fonward and 
backward and accurately obtain the matched segment 
This is a retrieving method utilizing the advantage of the 
method of the present invention which can search for a 



partially matched segment at high speed. 

Retrieved results are displayed on the window 54. 
Display contents are attribute information, time informa- 
tion, and others. Or, retrieved results can be graphically 
5 displayed in the format shown in Fig. 17. Fig. 17 is an 
enlarged view of the window 52 and numeral 800 indi- 
cates an icon image of each typical frame. When a hor- 
izontal bar 806 is put under an icon image, it is found 
that a retrieved result exists in the scene con^esponding 
10 to the icon image. When a retrieved result spans a plu- 
rality of scenes of an icon image, the bar becomes 
longer for the part. The bar is classified by a color or a 
hatching pattern. For a plurality of scenes found by 
retrieval of the same scene, the same color Is displayed. 
15 On the other hand, for a retrieved result of a scene and 
a retrieved result of another scene, different colors are 
displayed. The list of typical frames can be used as con- 
tents or indexes of images as mentioned above and is 
very useful for f inding an image to be queried. However, 
20 a dilemma arises that the typical frames are not all 
images included in video and if all images are tabulated, 
it is difficult to find a desired image from them. There- 
fore, it can be considered to extract typical characteris- 
tics of scenes indicated by the typical frames by 
25 analyzing video and for exanrple. to find video of a part 
not included in images of the typical frames by display- 
ing each icon image 800 together with information 802 
representing characteristics and time information 804. 
Such information representing scene characteristics 
30 includes existence of a person, camera work (zoom, 
pan. tilt, etc.), existence of special effect (fade in or out. 
dissolve, wipe, etc.), existence of titie, and others. With 
respect to tiie image recognition method for detecting 
images. Japanese Patent Application LaidOpen 7- 
35 210409 (applied on Aug. 18, 1995) applied by tiie inven- 
tors of tiie present invention can be used. The related 
disclosure of Japanese Patent Application Na 7- 
210409 is incorporated herein by reference. When tiie 
mettiod of the present invention is applied, it can be 
40 useful to dissolve tiie dilemma of the list of typical 
frames by another approach. Wrth respect to repetitive 
scenes, not the whole scenes but some of them may be 
included in the list of typical frames. For example, in Fig. 
18, when one of the repetitive scenes is clicked and 
45 retrieved by the cursor 53, scenes having the same 
video part as that of tiie scene are all found and indi- 
cated to the user. The retrieved result is indicated in a 
form of emphasizing the icon image of the scene includ- 
ing tiie retrieved segment, for example, like a star mari< 
50 810 superimposed on an icon image 808. In this case, if 
the icon image itself to be displayed is r^laced with a 
frame image in the retrieved segment, the indication is 
made more clearly understandable. By doing this, if 
tiiere is only one image of tiie same scene as the scene 
55 to be found in the list of typical frames, it is possible to 
find a desired scene by ttie help of it and the servicea- 
bleness of the list of typical frames is enhanced. The 
same mettiod can be applied to the video displayed on 
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— . the monitor window 50 and it is also possible to specify 
a frame displayed by clicking, retrieve the same scenes 
as the scene including the frame, and junp to one of the 
found scenes. To realize such a process, a troublesome 
preparation such as setting of a link node is convention- 
ally necessary However, if the method of the present 
invention Is used, very quick retrieval is available, so 
that it is desirable to execute retrieval when necessary 
and no preparation is necessary. 

To execute the self organization process shown in 
the block diagram In Fig, 9. the user does not need to 
execute any special process for retrieval and if he just 
inputs an image, the computer automatically executes 
the process. 

In the atjove explanation, the method for retrieving 
on the basis of image characteristics of video is 
described. However, voice characteristics may be used 
and needless to say. to not only video but also media 
which can be successively handled, this retrieval 
method can be applied. 

Fig. 19 shows an example that the image retrieval 
art of the present invention is applied to a video camera. 
When power is turned on by a power switch 1961 
installed in a process input unit 1 960 and picture record- 
ing is instructed by a pcture recording button 1962, a 
voice, image input processor 1910 performs processes 
of inputting a voice signal from a microphone 191 1 and 
an image signal from a camera 1912. The process of 
the voice, image input processor includes the A-D con- 
version process and compression process for inputted 
voice and innage signals. A feature extraction unit 1970 
extracts frame-wise features from an inputted image 
signal. The process contents are the same as those of 
the frame feature extractor 106 shown in Figs. 2 and 9. 
The extracted features are stored in a memory 1940 as 
a feature table. The memory 1940 uses a buitt-in semi- 
corKluctor memory and a removable memory card. 
Inputted voice and image signals are retained in the 
memory 1940, read from the memory 1940 by a play- 
back instruction from a piayt)ack button 1963, and sub- 
jected to the expanding process for signal compression 
and the D-A conversion process t}y the voice, irrege 
output processor, and images are oulputted to a display 
screen 1921. and voice is outputted from a speaker 
1922. A controller 1930 manages and controls the 
whole signal process of the video camera. With respect 
to an inputted image, the feature thereof is extracted for 
each frame and stored in the memory. The controller 
1930 compares the feature of an inputted image with 
the features of past frames retained in the memory 
1 940. The comparison process may be performed in the 
same way as with the feature comparator 130 shown in 
Figs. 2 and 9. As a result of comparison, the segment of 
scenes having a similar feature is retained in the mem- 
ory 1940 in the same format as that of the retrieved 
result table (128 shown in Figs. 2 and 9). Numeral 1950 
indicates a terminal for supplying power for driving the 
video camera and a battery may be mounted. An image 



retrieval menu button 1964 instructs a brief editing proc- 
ess such as rearrangement or deletion of scenes or a 
process of instructing a desired scene and retrieving 
and playing l>ack similar scenes by pressing the button 

5 1 964 several times on the display screen 1 921 on which 
a recorded moving picture image is displayed, for exam- 
ple, like Figs. 16. 17. and 18. With respect to the art for 
detecting the changing point of a moving picture image 
used for sorting of scenes, Japanese Patent Application 

70 Laid-open 7-32027 (applied on Feb. 21 . 1995) applied 
by the inventors of the present invention can be referred 
to. The related disclosure of Japanese Patent Applica- 
tion No. 7-32027 is incorporated herein by reference. 
Scenes are retrieved by use of the image feature com- 

15 parison process executed in Figs. 2 and 9. For such a 
video camera, it is necessary to adjust the concGtions of 
the feature comparison process rather loosely The rea- 
son is that unlike a TV program, when a user generally 
picks up images with a video camera, he scarcely picks 

20 up exactly same images. Therefore, when similar 
scenes or persons in the same style of dress are photo- 
graphed in a similar size, the comparison condition is 
set so that they are retrieved as similar scenes. Picked- 
up images are analyzed at the same time with recording 

25 and grouping for each scene and indexing between sim- 
ilar scenes are completed, so that recorded images can 
be edited immediately after picking up and the usability 
by a user is improved. 

30 Effects of the invention 

According to the present invention, by the afore- 
mentioned method, redundant segments with an almost 
same feature continued are collected and compared 

35 into a unit. Therefore, there is no need to execute com- 
parison for each frame, and the calculation amount can 
be greatly reduced, and a form that comparison is 
falsely executed between the feature series in frame- 
wise Is taken at the same time, so that the method is 

40 characterized in that the same image segment can be 
specified with the frame accuracy Whenever a frame is 
inputted, only the frame is conpared, so that the 
processing amount for one frame input is made smaller 
and the method is suitable for processing of images 

45 requiring the real time including broadcast images. A 
plurality of image parts detected at the same time are 
exactly same images, so that when they are stored as a 
set, if a request to search one partial image is pre- 
sented, the retrieval is completed by indicating another 

so partial image of the set and a very quick response can 
be expected. 

Claims 

55 1. A signal series retrieving method in an information 
processing system, which includes time sequential 
signal input means, a time sequential signal proc- 
ess controller and a storage, comprising the steps 
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sequentially inputting time sequential signals; 
sequentially extracting features in each prede- 
termined period of said input time sequential s 
signals; 

converting said features sequentially extracted 
Into a feature series corresponding to said 
input predetermined period series; 
compressing said feature series in the direction 10 
of the time axis; 

storing said compressed feature series In said 
storage; 

sequentially extracting features from said time 
sequential signals to be retrieved in each pre- is 
determined period of said input time sequential 
signals; 

sequentially comparing said features of said 
time sequential signals to be retrieved in each 
predetermined period with said stored com- 20 
pressed feature series; 

storing a progress state of said comparison; 
and 

retrieving a signal series matching with said 
progress state from said time sequential sig- 25 
nals to be retrieved on the basis of said com- 
parison result between said stored progress 
state of said comparison and said features of 
said time sequential signals to be retrieved in 
each predetermined period. 30 

An image retrieving method in an information 
processing system, which includes image input 
means, an image process controller and a storage, 
connprising the steps of: ^ 

sequentially inputting images for each frame; 
sequentially extracting features from said input 
frame images; 

converting said features sequentially extracted 40 
into a feature series corresponding to said 
Input frame image series; 
compressing said feature series in the direction 
of the time axis; 

storing said compressed feature series in said 45 
storage; 

sequentially extracting features from said 
Images to be retrieved for each said input 
frame; 

sequentially comparing said features of said so 

images to be retrieved for each frame with said 

stored compressed feature series; 

storing a progress state of said comparison; 

arKi 

retrieving image scenes matching with said ss 
progress state from said images to be retrieved 
on the basis of said comparison result between 
said stored progress state of said comparison 



and said features of said images to be retrieved 
for each frame. 

3. The method of claim 2, wherein 

the stored progress state of said comparison is 
updated on the basis of a comparison result 
with said frame features of said succeeding 
images to be retrieved, and 
Image scenes matching with said updated 
progress state are retrieved on the basis of said 
updated comparison result. 

4. The method of Claim 3, wherein with respect to 
storage and update of said progress state of said 
comparison, the number of the top frame in which a 
comparison match may occur is provisionally 
recorded, and when the comparison match contin- 
ues, the frame number to be compared is updated, 
and when the possibility of comparison match is 
lost, said frame number to be compared is deleted. 

5. The method of Claim 2 or 3, wherein a statistic of 
brightness or colour is used as said feature. 

6. The method of Claim 2 or 3. wherein said compres- 
sion of said feature series in the direction of the 
time axis is executed on the assumption that when 
the difference between the feature of a frame image 
and the feature of the next frame image is within a 
predetermined tolerance, the features are the 
same. 

7. The method of Claim 2 or 3, wherein It is assumed, 
with respect to a match with said progress state, or 
updated progress state, respectively, of said feature 
series, that when features with more than a prede- 
temiined length match with each other, a compari- 
son result match occurs. 

8. The method of Claim 2 or 3. wherein when a match 
with said progress state, or updated progress state, 
respectively, of said comparison occurs in a plural- 
ity of locations: 

the frame image series In said plurality of loca- 
tions is stored so as to be accessed as a 
related set. or 

sequential programs on the air are classified on 
the basis of the frame image series in said plu- 
rality of locations, or 

a specific image on the air is detected on the 
t>asis of the frame image series matched in 
said plurality of locations and the time length 
thereof. 

9. A signal series retrieving system in an information 
processor, which includes time sequential signal 
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input means, a time sequential signal process con- 
troller and a storage, comprising: 

means for sequentially inputting time sequen- 
tial signals; s 
means for sequentially extracting features in 
each predetermined period of said input time 
sequential signals; 

means for converting said features sequentially 
extracted into a feature series corresponding to 10 
said input predetermined period series; means 
for compressing said feature series in the direc- 
tion of the time axis; means for storing said 
compressed feature series in said storage; 
means for sequentially exta^acting features from is 
said time sequential signals to be retrieved in 
each predetermined period of said input time 
sequential signals; means for sequentially 
comparing said features of said time sequential 
signals to be retrieved in each predetermined 20 
period with said stored compressed feature 
series; means for storing a progress state of 
said comparison; and means for retrieving a 
signal series matching with said progress state 
from said time sequential signals to be 2S 
retrieved on tiie t>asis of said comparison result 
between said stored progress state of said 
comparison and said features of said time 
sequential signals to be retrieved in each pre- 
determined period. so 

10. A signal series retrieving system in an information 
processor, which includes image input means, an 
image process controller arrd a storage, compris- 
ing: 35 

means for sequentially irputting images for 
each frame; 

means for sequentially extracting features from 
said input frame images: ^ 
means for converting said features sequentially 
extracted into a feature series corresponding to 
said input frame image series; 
means for compressing said feature series in 
the direction of the time axis; 45 
means for storing said compressed feature 
series in said storage; 

means for sequentially extracting features from 
said images to be retrieved for each said input 
frame; 50 
means for sequentially comparing said fea- 
tures of said images to be retrieved for each 
frame with said stored conpressed feature 
series; 

means for storing a progress state of said com- ss 
parison; and 

means for retrieving image scenes matching 
with said progress state from said images to be 



retrieved on the basis of said comparison result 
between said stored progress state of said 
comparison and said features of said images to 
be retrieved for each frame. 

11. The system of claim 10, further comprising means 
for updating said stored progress state of said com- 
parison on tiie basis of a comparison result witii 
said frame features of said succeeding images to 
be retrieved, wherein said retrieving means are 
adapted to retrieve image scenes matching witii 
said updated progress state on the t>asis of said 
updated conparison result. 

12. A program storage enabling execution of a process 
by an information processor, which includes time 
sequential signal input means, a time sequential 
signal process controller and a storage, conrprising: 

a storage medium storing a program including 
the following processes which can be read by 
said information processor; 
a process of sequentially inputting time 
sequential signals; 

a process of sequentially extracting features in 
each predetermined period of said input time 

sequential signals; 

a process of converting said features sequen- 
tially extracted into a feature series con-e- 
sponding to said input predetermined period 
series; 

a process of compressing said feature series In 
the direction of the time axis; 
a process of storing said compressed feature 
series in said storage; 

a process of sequentially extracting features 
from said time sequential signals to be 
retrieved in each predetermined period of said 
input time sequential signals; 
a process of sequentially comparing said fea- 
tures of said time sequential signals to be 
retrieved in each predetermined period with 
said stored compressed feature series; 
a process of storing a progress state of said 
comparison; and 

a process of retrieving a signal series matching 
with said progress state from said time sequen- 
tial signals to be retrieved on the basis of said 
comparison result between said stored 
progress state of said comparison arKi said 
features of said time sequential signals to be 
retrieved in each predetermined period. 

13. A program storage enabling execution of a process 
by an information processor, which includes image 
input means, an image process controller and a 
storage, comprising: 



13 



3DOCID: <EP_0878767A1_L> 



25 



EP0878 767A1 



26 



a storage medium storing a program including 
the following processes which can be read by 
said information processor; 
a process of sequentially inputting images for 
each frame; ^ 
a process of sequentially extracting features 
from said input frame images; 
a process of converting said features sequen- 
tially extracted into a feature series corre- 
sponding to said input frame image series; 10 1 6. 
a process of compressing said feature series in 
the direction of the time axis; 
a process of storing said compressed feature 
series in said storage; 

a process of sequentially extracting features 75 
from said images to be retrieved for each said 
input frame; 

a process of sequentially comparing said fea- 
tures of said images to be retrieved for each 
frame with said stored compressed feature so 
series; 

a process of storing a progress state of said 
comparison; and 

a process of retrieving image scenes matching 
with said progress state from said images to be 2S 
retrieved on the basis of said comparison result 
between said stored progress state of said 
comparison and said features of said images to 
be retrieved for each frame. 

30 

14. The program storage of claim 13, wherein 

the stored progress state of said comparison is 
updated on the basis of a comparison result 
with said frame features of said succeeding 3S 
images to be retrieved, and 
image scenes nnatching with said updated 
progress state are retrieved on the basis of said 
updated comparison result. 

40 

15. An information processor, comprising: 



image retrieval process and associating said 
moving picture image segments 1 and 2; and 
means for displaying said associated moving 
picture image segments 1 and 2 in distinction 
from other moving picture image segments 
when a nrwving picture image input by said dis- 
play process of a retrieved image on said dis- 
play is simply displayed on said display. 

A video camera, comprising: 

a camera for inputting an image; 

an input processor of said image; 

a storage for storing an image input from said 

camera; 

a output processor for reproducing and output- 
ting an image stored in said storage; 
a feature extractor for extracting the feature of 
an input image for each frame; 
a memory area for tattling and retaining said 
extracted feature; 

a process of comparing the feature of an input 
image with the feature on the table; and 
a process of associating frames having the fea- 
ture agreeing with a predetermined compari- 
son condition as similar images. 



a display; 

a memory having a process program and a 
data retaining area; ^ 
control means for performing an image irput 
process, an image retrieval process, and a dis- 
play process of a retrieved image on said dis- 
play according to said process program; 
means for storing a moving picture image input so 
by said image input process in said memory in 
frame-wise; 

means for retrieving a moving picture image 
segment 2 which is regarded as tiie same as a 
moving picture Image segment 1 with a prede- ss 
termined length from said memory by the 
frame series of said stored moving picture 
image when a frame is newly input by said 
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